Dropout #101

micmelesse · 2024-11-15T20:03:59Z

No description provided.

This is a combination of 11 commits. save fix: dropout=0.0 woorks feat: dropout restrictions removed. failing tests test: reduced tests to simple cases test: failure is due to query + key padding mask NOT varlen itself feat: varlen dropout fwd passes fix: varlen bwd dropout works! test: discovered bwd error for non-dropout cases for large seqlen save save use triton commit 3ca2f498e98ed7249b82722587c511a5610e00c4 -- now batched layout passes

This is a combination of 63 commits. pick test case save philox offsets into metadata pass offset to ref common dropout mask simple droput out mask start dropout ref. work on returning SD_Mask next with negative numbers refernce is working dropout bwd ref faling case transfer rng_state properly save changes one dropout mask function save save minizmize diff save use torch.where in backward save save save dk works! passes reference is working. TODO" attn_ref is broken varlen ref working attn failing case with ones. attn_ref matches. fails with randn. we are seeing failure with large sizes from dv. save skip attn matrices compare the masks and find failing case rm cdiv_fn put dropout and alibi in common save compare masks save save pytorch ref is using tiles save save tl_rand_ref cache ref dropout mask new generate_dropout_mask_ref using tiling issolate failing varlen case simple dropout loop on k print rng_outputs save fwd kernel works save dv passed close to dk simple ref save seperate droped and scaled in ref and triton kernel ref changes working delta with dp find failing dv failures find failing case due to delta save delta from dp working bwd impl green enable test fwd save save delete kernels

micmelesse force-pushed the micmelesse/dropout branch from fa644ed to c922740 Compare December 2, 2024 14:41

alexkranias-amd and others added 2 commits December 2, 2024 20:26

micmelesse force-pushed the micmelesse/dropout branch from c922740 to 788ecf6 Compare December 2, 2024 14:58

micmelesse added 14 commits December 2, 2024 20:51

save

3b7f290

probably mask application mismatch

d008a3c

dump forward dropout

118e705

pass dropout mask tensor to bwd_core

ad27206

different dropout fraction in fwd and bwd

1a24f0c

mismatch found on columns greater than 64

40f31a7

fix dropout bug. philox was not offset

cb88b6f

run full suite

9dc2002

stop debug and approximate delta

227dfa1

fix drop_mask non issue

1ec68e6

skip attn check

c88bc65

clean up common

c0e7d31

bad varlen config

b577610

fix varlen bug

e228683

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dropout #101

Dropout #101

micmelesse commented Nov 15, 2024

Dropout #101

Are you sure you want to change the base?

Dropout #101

Conversation

micmelesse commented Nov 15, 2024